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Abstract 

Assessment of health-related quality of life (HRQL) is particularly important in patients with progressive and incurable 
diseases such as idiopathic pulmonary fibrosis (IPF). The St George's Respiratory Questionnaire (SGRQ) has frequently 
been used to measure HRQL in patients with IPF, but it was developed for patients with obstructive lung diseases. The 
aim of this review was to examine published data on the psychometric performance of the SGRQ in patients with IPF. 
A comprehensive search was conducted to identify studies reporting data on the internal consistency, construct 
validity, test-retest reliability, and interpretability of the SGRQ in patients with IPF, published up to August 2013. In total, 
data from 30 papers were reviewed. Internal consistency was moderate for the SGRQ symptoms score and excellent 
for the SGRQ activity, impact and total scores. Validity of the SGRQ symptoms, activity, impact and total scores was 
supported by moderate to strong correlations with other patient-reported outcome measures and with a measure of 
exercise capacity. Most correlations were moderately strong between SGRQ activity or total scores and forced or static 
vital capacity, the most commonly used marker of IPF severity. There was evidence that changes in SGRQ domain and 
total scores could detect within-subject improvement in health status, and differentiate groups of patients whose 
health status had improved, declined or remained unchanged. Although the SGRQ was not developed specifically for 
use with patients with IPF, on balance, its psychometric properties are adequate and suggest that it may be a useful 
measure of HRQL in this patient population. However, several questions remain unaddressed, and further research is 
needed to confirm the SGRQ's utility in IPF. 

Keywords: Idiopathic pulmonary fibrosis, Patient-reported outcomes, PROs, St George's Respiratory Questionnaire, 
SGRQ, Health-related quality of life, HRQL, Psychometrics, Validity, Reliability 



Introduction 

Idiopathic pulmonary fibrosis (IPF) is a specific form of 
fibrosing interstitial pneumonia characterized by pro- 
gressive worsening of dyspnea and lung function [1]. In 
the United States, the annual incidence of IPF has been 
estimated as 6.8-8.8 cases per 100,000 using narrow 
case definitions (requiring a definite pattern of Usual 
Interstitial Pneumonia [UIP] on high-resolution com- 
puted tomography [HRCT]), and as 16.3-17.4 cases per 
100,000 using broad case definitions (including patients 
with a possible UlP-pattern on HRCT) [2]. Although IPF 
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has a poor prognosis, with a median survival time from 
diagnosis of 2 to 3 years, the clinical course of IPF varies 
considerably [1,3]. Symptoms experienced by patients 
with IPF include non-productive cough, fatigue and 
chronic dyspnea, with the latter being the most promin- 
ent and disabling [4]. The morbidity associated with IPF 
has a broad and profound impact on patients' health- 
related quality of life (HRQL) [4,5]. 

As IPF is a progressive disease with no cure, HRQL 
and other patient-centered outcomes are important end- 
points to evaluate in research and clinical practice [6]. 
Although no disease-specific measure of HRQL has been 
established as suitable for longitudinal research in pa- 
tients with IPF, several HRQL instruments (and others, 
including symptom and generic quality of life question- 
naires) have been used [7,8]. Which patient-centered 
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instrument(s) (including HRQL questionnaires) to use in 
a particular study depends on a number of factors, in- 
cluding the design of the study, the intervention being 
assessed, the hypotheses being tested, and the character- 
istics of the comparator group (general population, 
patients with IPF of different severity, patients with an- 
other disease, etc.). In any situation, whether a generic 
HRQL instrument might perform as well or better than 
a disease-specific HRQL instrument is uncertain. 

In this review, we focused on the St George s Respiratory 
Questionnaire (SGRQ). Although originally developed for 
use in patients with chronic obstructive pulmonary disease 
(COPD) and asthma [8], it has frequently been used to 
evaluate HRQL in patients with IPF. The SGRQ is a 50- 
item questionnaire split into three domains: symptoms 
(assessing the frequency and severity of respiratory symp- 
toms), activity (assessing the effects of breathlessness on 
mobility and physical activity), and impact (assessing the 
psychosocial impact of the disease) [9]. Scores are 
weighted such that every domain score and the total score 
range from 0 to 100, with higher scores indicating a 
poorer HRQL. 

The aim of this review was to assess the appropriate- 
ness of the SGRQ for measuring HRQL in patients with 
IPF by examining the evidence relating to the psycho- 
metric performance of the SGRQ in this population. A 
revised version of the SGRQ, the SGRQ-I, has been de- 
veloped for use in patients with IPF [10]; however, stud- 
ies assessing this tool are limited, and SGRQ-I data are 
not covered in this manuscript. 



Methods 

Search strategy and data extraction 

A comprehensive literature review was conducted to 
identify articles that evaluated the psychometric proper- 
ties of the SGRQ in patients with IPF. Following a 
PubMed search (see Additional file 1), articles were ex- 
cluded if they were not published between 1 January 
1991 (date of first publication of the SGRQ) and 31 
August 2013, were not published in English, did not re- 
port data on the psychometric properties of the SGRQ 
in patients with IPF or duplicated clinical trial data re- 
ported in another article (Figure 1). Data extracted from 
the studies included study characteristics (country, dur- 
ation, design, sample size), participant characteristics 
(age, gender, time since diagnosis, forced vital capacity 
[FVC]% predicted, diffusing capacity for carbon monox- 
ide [DLco]% predicted) and results of the psychometric 
tests. 

Articles were selected that assessed any of the follow- 
ing psychometric properties of the SGRQ: internal 
consistency, convergent validity, known groups validity, 
test-retest reliability (reproducibility), responsiveness, 
minimal important difference (MID), and floor and ceil- 
ing effects [11]. Internal consistency refers to the degree 
to which the individual items within an instrument cor- 
relate with each other (i.e., tap the same underlying con- 
struct). This is determined using Cronbach's coefficient 
alpha, with >0.70 considered to indicate acceptable in- 
ternal consistency for a multi-dimensional instrument. 
Convergent validity describes the degree to which two 



PubMed search (see Appendix) 



466 articles 



350 articles 



70 articles 


\ 






30 articles included in the review 



Figure 1 Selection of articles to be included in the review. 



116 articles excluded: 

• Not in English 

• Published prior to 1991 



280 articles excluded: 

• Animal or in vitro study 

• Patient population did not include patients with IPF 



40 articles excluded: 

• Study was not conducted primarily in patients with 
IPF or report results for patients with IPF separately 

• No results reported on psychometric performance 
of SGRQ in patients with IPF 

• Article duplicated clinical trial results reported in 
another article 
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measures, hypothesized to measure the same construct, 
correlate. Known groups validity refers to the extent to 
which scores on an instrument distinguish groups that 
differ on a key variable, usually clinical in nature. For the 
described validity measures, correlations were regarded 
as weak if <0.30, moderate if 0.30-0.60, and strong 
if >0.60 [12]. Test-retest reliability assesses the ability of 
an instrument to produce consistent scores over repeated 
measurements in patients who are clinically stable. Re- 
sponsiveness assesses the ability of an instrument to detect 
change in individuals who are hypothesized to have chan- 
ged on the underlying construct (HRQL) and who are 
known to have experienced change in clinical status. MID 
estimates identify the smallest difference in the score on 
an instrument that patients perceive as important. Floor 
and ceiling effects are limitations that occur when an indi- 
vidual scores at the extremes of an instrument; if a pa- 
tient's score is the lowest or highest possible value, the 
instrument is unable to detect a reduction or increase, 
respectively. 

Results 

A total of 30 papers were included in the review (Figure 1; 
Table 1). 

Internal consistency 

Data from a clinical trial of bosentan have been used to 
determine the internal consistency of the SGRQ in pa- 
tients with IFF. Cronbach's alpha was 0.66 for the symp- 
toms score and >0.84 for each of the SGRQ activity, 
impact and total scores [10,34]. 

Convergent validity 

Convergent validity was evaluated by extracting cross- 
sectional and longitudinal correlations between SGRQ 
scores and other patient-reported outcome measures 
(Table 2), an assessment of exercise capacity (Table 3), 
pulmonary function tests (PFTs) or partial pressure of 
arterial oxygen (Table 4), and assessments of fibrotic ab- 
normalities on HRCT (Table 5). 

Patient-reported outcome measures 

In nine studies, investigators provided information on the 
correlation between SGRQ scores and other patient- 
reported outcome measures (BDI [Baseline Dyspnea 
Index], D-12 [Dyspnea-12], K-BILD [King's Brief Intersti- 
tial Lung Disease questionnaire], UCSD-SOBQ [University 
of California San Diego Shortness of Breath Question- 
naire], CQLQ [Cough Quality of Life Questionnaire], a 
single-item dyspnea assessment, SF-36 Physical Compo- 
nent Summary score [SF-36 PCS] and the Borg Dyspnea 
Index) (Table 2). Moderate to strong correlations were ob- 
served between the SGRQ total score and the total scores 
on these instruments (Table 2). In general, moderate to 



strong correlations were observed between SGRQ domain 
scores and the total scores on these instruments. Likewise, 
moderate to strong correlations were observed between 
SGRQ domain or total scores and the total, physical com- 
plaints, extreme physical complaints, and functional abil- 
ity sub-scale scores of the CQLQ (r = 0.34 to 0.81) [21], 
the total and sub-scale scores of the K-BILD (r = -0.59 
to -0.89) [27], the SF-36 PCS, a composite score measur- 
ing overall physical health (r = -0.52 to -0.74) [10] and the 
Borg Dyspnea index (r = 0.35 to 0.56) [10,15]. For most 
measures and their sub-scales, correlations were weakest 
with the SGRQ symptoms score (when compared with 
other SGRQ domains or the total score). 

In two studies, investigators evaluated correlations be- 
tween SGRQ change scores and change scores from other 
patient-reported outcome measures (Table 2). In one 
study, correlations were moderately strong between 
change scores for the SGRQ activity, impact and total 
scores and change scores from the single-item dyspnea as- 
sessment (r = 0.59, 0.56 and 0.45, respectively) [28]. In the 
other study, investigators found that the correlation be- 
tween the BDI change score and SGRQ total change score 
was -0.29 and not significant [25]. However, the BDI was 
designed to measure dyspnea severity at a single point in 
time and not to measure change in dyspnea severity [42] . 

Measures of exercise capacity 

Correlation coefficients between SGRQ scores and a 
measure of exercise capacity are presented in Table 3. Dis- 
tance covered during the 6-minute walk test (6MWD) is 
frequently used as a measure of exercise capacity in pa- 
tients with IPF, and change in 6MWD has been shown to 
be a predictor of mortality in these patients [16]. In five 
cross-sectional studies in patients with IPF, investigators 
examined the relationship between the SGRQ total score 
and 6MWD. The strength of these correlations was mod- 
erate to strong in three (-0.45 to -0.72) [15,28,40] and 
weak in two (-0.26 and -0.28) [10,16] studies. In four 
cross-sectional studies, investigators examined the rela- 
tionship between the SGRQ domain scores and the 
6MWD [10,28,39,40]; the strength of these correlations 
was moderate to strong for the activity score in all four 
studies (r = -0.32 to -0.72), moderate to strong for the im- 
pact score (r = -0.41 to -0.63) and moderate for the symp- 
toms score (r = -0.32 to -0.41) in three studies. In three 
studies, investigators examined the relationship between 
change scores for the SGRQ total and change in 6MWD; 
[16,25,28]correlation coefficients ranged firom -0.23 to -0.43. 

Pulmonary function tests and arterial blood gas analysis 

Table 4 presents correlations between SGRQ scores and 
either PFTs or arterial blood gas analysis in patients with 
IPF. All correlations between the SGRQ total score 
and these variables were moderate to strong (r = -0.30 



Table 1 Studies included in this review 



Study 



Study type 



Experimental Country 
treatment 



Sample Disease duration 
size^ (mo) mean (SD) 



Baseline SGRQ score^ 



Baseline spirometric values^ 



All IPF 



FVC% predicted 



DLco% 
predicted 



Antoniou ef al., 
2006 [13] 

Berry et ai, 
2012 [14] 

Chang ef al., 
1999 [15] 



du Bois ef al., 

2011 [16] 

Han ef al., 
2013 [17] 

Horton ef al, 

2012 [18] 



King, Jr. ef al., 

2008 [1 9] 

King, Jr. ef al., 

2009 [20] 

Lechitzin ef al., 
2013 [21] 



Mishra ef al., 
2011 [22] 

Naji ef al., 
2006 [23] 

Nishiyama ef al., 
2005 [24] 



RCT 

Secondary 
validation study 

Standalone 
validation study 



RCT 
RCT 
RCT 

RCT 
RCT 
RCT 



nterferon 
gamma b 

n/a 
n/a 



nterferon 
gamma b 

Sildenafil 
Thalidomide 



Bosentan 

nterferon 
gamma b 

Thalidomide 



Greece 50 50 7 = 494(24.3) 

C = 42.7 (1 6.8) 

US 405 239 - 

US 50 33 - 



Multi-national 822 822 - 

US 119 119 204 

US 23 23 20.5 (3-59) 

Multi-national 158 158 7= 12.2 (12.2) 
C = 12.1 (12.0) 

Multi-national 826 826 - 

US 24 24 - 



Within-subject trial Oral doxycycline India 

Within-subject trial Pulmonary Ireland 
rehabilitation 



Standalone n/a 
validation study 



Japan 



6 6 
26 19 
41 41 



Total = 38.9 [28.7-55.2] 
Symptoms domain = 50.5 [31 .5-69.8], 
Activity domain = 544 [39.9-72.9], 
Impact domain = 28.4 [18.8-45.1] 

Total = 41.8 (18) 



Total = 574 (18.8) 
Symptoms domain = 67.7 (1 9.7), 
Activit/ domain = 64.3 (22.7), 
Impact domain =48.1 (20.7) 

T-Total = 45.7 (1 8.1 ), C-Total = 45.2 (1 9) 



T-Total = 41 .6 (1 7.9), C-Total = 42.4 (1 8.2) 

Total = 574 (18.8), 
Symptoms domain = 67.7 (1 9.7), 
Activity domain = 64.3 (22.7), 
Impact domain =48.1 (20.7) 

Total = 50.90 (8.38) 



Total = 



1 (27.6, 67.9) 



Total = 35.7 (20.6) [range 1.6-77.6], 
Symptoms domain = 40.1 (24.5) 
[range 4.4-85.6], Activity domain = 44.5 

(26.7) [range 0-93.9], Impact domain = 28.9 

(19.8) [range 0-77.0] 



T = 71.8 (15.0) 
C = 70.7 (17.7) 

66.0 (52-78) 
65.0 (49.0-81.0) 

72.5 (12.7) 
56.9 

70.4 (13.7) 



T = 65.9 (10.5), 
C = 695 (12.6) 

T = 72.2 (12.3), 
C = 73.1 (13.4) 

70.4 (13.7) 



n/a 

66.7 (20.7) 
n/a 



49.0 (365-59.8) 

47.4 (9.2) 
26.0 

574 (14.4) 



T = 42.3 (9.5), 
C = 41.4 (9.5) 

T = 474 (9.2), 
C = 47.3 (9.3) 

57.4 (14.4) 



n/a 

425 (14) 
n/a 
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Nishiyama ef a/., 
2008 [25] 



RCT 



Noth ef a/., 
2012 [26] 

Patel et al., 
2012 [27] 

Peng ef al., 
2008 [28] 



Raghu ef al., 
2004 [29] 

Raghiu ef al., 

2008 [30] 

Raghu ef al., 
2013 [31] 

Rammaert ef ai, 

2009 [32] 

Richeldi ef al., 
2011 [33] 



Swigris ef al., 
2010 [34] 



Swigris ef al., 
2012 [35] 

Tzanal<is ef al., 
2005 [36] 



Tzouvelel<is ef al., 
2013 [37] 



Pulmonary 
rehabilitation 



Japan 



RCT 



Warfarin 



Standalone n/a 
validation study 

Standalone n/a 
validation study 



RCT 
RCT 
RCT 



Interferon 
gamma b 

Etanercept 
Ambrisentan 



Within-subject trial Pulmonary 
rehabilitation 



RCT 



Nintedanib 
(BIBF 1120) 



US 



145 145 7 = 21.6, C = 25.2 



UK 173 49 48.0 

China 68 68 14.0(14.0) 

Multi-national 330 330 - 



Multi-national 88 



T = 14.7 (19.8), 
C=12.3 (13.5) 



Multi-national 492 492 T= 13.2, C= 10.8 



T-Total = 50.2 (16.3), 
T-Symptoms domain = 56.4 (22.3), 
T-Activity domain = 64.7 (1 7.1 ), 
T-lmpact domain = 39.7 (1 7.5), 
C-Total = 37.8 (22.7), 
C-Symptoms domain = 38.0 (25.8), 
C-Activity domain = 50.4 (25.2), 
C-lmpact domain = 29.9 (23.7) 

T-Total = 45.2 (1 8.0), C-Total = 50.1 (1 7.2) 



Total = 54 (15), 
Symptoms domain = 65 (16), 
Activity domain = 56 (1 5), 
Impact domain =49 (19) 



T-Total = 40.8 (1 8.1 ), C-Total = 42.9 (1 9.4) 
T-Total=44.5 (21.5), C-Total = 405 (21.1) 



France 



13 13 



Multi-national 428 



428 T, 50 mg qd = 16.8 
(15.5), T, 50 mg 
bid = 13.2 (14.4), 
T, 100 mg bid = 144 
(14.4), T, 150 mg 
bid = 12 (14.4), 
C = 16.8 (18) 



Secondary Bosentan 
validation study 



Secondary Sildenafi 
validation study 

Standalone n/a 
validation study 



Within-subject trial Adipose-derived 
stromal cells 

n/a 



Multi-national 158 158 



US 



Greece 



Greece 



Canada 



180 180 24.0 



25 25 31.2 



14 14 - 



137 137 - 



T, 50 mg qd-Total = 43.7 (17.5), 
T, 50 mg bid-Total = 42.5 (17.0), 
T, 100 mg bid-Total = 43.7 (15.6), 
T, 150 mg bid-Total = 40.1 (18.3), 
C-Total = 41 .2 (17.9) 



Total = 44.8 (19.5), 
Symptoms domain = 50.1 (21 .9), 
Activit/ domain = 60.5 (22.8), 
Impact domain = 33.7 (20.5) 

Activity domain = 59.5 (17.5) 



Total = 37.7 (18.9), 
Symptoms domain = 55.9 (25.3), 
Activity domain = 35.2 (21 .4), 
Impact domain = 29.5 (21) 



T = 56.1 (13.2), 
C = 58.7 (19.5) 



T = 594 (15.7), 
C = 48.5 (15.7) 



T = 58.9 (15.2), 
C = 58.7 (16.1) 

82 (34-143) 



66 (1 8) 



T = 53.9 (10.7), 
C = 64.1 (11.3) 

T = 54.7 (14.1), 
C = 63.0 (12.7) 

T = 68.7 (13.1), 
C = 69.9 (13.8) 



T = 33.8 (12.4), 
C = 34.6 (13.4) 



54 (15) 



T = 35.3 (12.6), 
C = 35.9 (10.8) 

T = 42.0 (13.8), 
C = 45.6 (13.3) 



67 (14) 

T, 50 mg qd = 804 
(17.8),T, 50 mg bid = 79.8 
(15.8),T, 100 mg bid = 855 
(19.2),T, 150 mg bid = 79.1 
(18.5), C = 81. 7 (17.6) 



32 (13) 



67.0 (1 2.8) 

55.8 (14.2) 
58.8 (15) 



40.98 (10.1) 



26.3 (6.1) 



61.7 (19.8) 



495 (17.9) 



Table 1 Studies included in this review (Continued) 



Verma et a/., 
2011 [38] 



Yorke et al., 
2010^ [10] 

Yorke et al., 
2011 [39] 



Standalone 
validation study 



Secondary 
validation study 

Standalone 
validation study 



Zimmermann er a/., Standalone 
2007 [40] validation study 



Zisman et al., 
2010 [41] 



Rcr 



Bosentan 



n/a 



n/a 



Sildenafi 



Multi-national 158 158 - 
Multi-national 101 67 - 

Brazil 22 22 - 



US 



180 180 T = 244, C = 224 



Total = 634 (3.7-96.3), 
Symptoms domain = 59.8 (0-97.2), 
Activity domain = 81 .6 (6.0-99.5), 
Impact domain = 54.1 (0-96.4) 



Total = 53 (24), 
Symptoms domain = 61 (23), 
Activity domain = 65 (30), 
Impact domain =41 (24) 

Total = 484 (17.9), 
Symptoms domain = 46.4 (20.3), 
Activity domain = 624 (1 9), 
Impact domain = 43.6 (20.9) 

T-Total = 54.55 (1 6.46), C-Total = 51 .72 (1 5.86) 



61.0 (12.2) 



77 (19.5) 



70.4 (19.4) 



T = 54.9 (14.00), 
C = 58.7 (14.12) 



51.6 (21) 



41.5 (16.2) 



'Sample size reported represents the population in which efficacy was assessed. RCT = randomized controlled trial; T = treatment group; C = comparator group; qd = once daily; bid = twice daily. ^Mean (SD) or 
median [interquartile range] are reported based on availability. ^Data reported refer to the original version of the SGRQ, not the SGRQ-I. 
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Table 2 Correlation coefficients between SGRQ scores and other patient-reported assessments of health status 



Measure 



Scale 



Correlation 
with SGRQ 
symptoms 



Correlation 
with SGRQ 



Correlation 
with SGRQ 



Correlation 
with SGRQ 









domain score 


score 


score 




Cross-sectional studies 














Chang ef al., 1999 [15] 


Borg Dyspnea Index 










0.56* 


Lechtzin et al., 2013 [21] 


CQLQ 


Total 


0.72* 


0.72* 


0.81* 


0.79* 






Physical complaints 


0.50* 


0.72* 


0.71* 


0.77* 






Psychological issues 


0.29 


040 


0.52* 


0.54* 






Functional ability 


0.53* 


0.54* 


0.55* 


0.66* 






Emotional well-being 


0.19 


042 


0.57* 


0.50* 






Extreme physical complaints 


0.38 


0.34 


0.53* 


0.54* 






Personal safety fears 


0.05 


0.23 


0.45* 


0.34 


Nishiyama ef al., 2005 [24] 


BDI 




-0.55* 


-0.77'' 


-0.53* 


-0.69" 


Patel ef al., 2012 [27] 


K-BILD 


Total 


-0.57* 


-0.79* 


-0.87* 


-0.89* 






Psychological 


-0.50* 


-0.57* 


-0.80* 


-0.79* 






Breathlessness 


-0.59* 


-0.84* 


-0.80* 


-0.85* 






Chest 


-0.55* 


-0.54* 


-0.79* 


-0.78* 


Peng ef al., 2008 [28] 


Dyspnea score 




NS 


0.58'' 


0.30* 


0.38* 


Swigris ef al., 2012 [35] 


UCSD-SOBQ 






0.80'' 






Yorl<e ef a/., 201 0' [10] 


BDI 




-0.39'' 


-0.72" 


-0.51" 


-0.58" 




SF-36 PCS 




-0.52'' 


-0.74" 


-0.63" 


-0.71" 




Borg Dyspnea Index 




0.35^ 


045'' 


040" 


0.45" 


Yorke ef al., 201 1 [39] 


D-12 




0.57* 


0.78* 


0.75* 


0.79* 


Zimmermann ef al., 2007 [40] 


BDI 




-0.52* 


-0.75* 


-0.63* 


-0.72* 


Longitudinal studies 














Nishiyama ef al., 2008 [25] 


A BDI 










-0.29 


Peng ef al., 2008 [28] 


A Dyspnea score 




NS 


0.59* 


0.55* 


045* 



BDI = Baseline Dyspnea Index; CQLQ = Cough Quality of Life Questionnaire; D-12 = Dyspnea-12; K-BILD = King's Brief Interstitial Lung Disease questionnaire; 
SF-36 PCS = SF-36 Physical Component Summary; UCSD-SOBQ = University of California San Diego Shortness of Breath Questionnaire; A = change. 
*p < 0.05; *p < 0.01; *p < 0.001; ^p < 0.0001; NS = non-significant. 'Data reported refer to the original version of the SGRQ, not the SGRQ-I. 



to -0.66, and p < 0.05 for all but one). There were 
moderate to strong correlations between the SGRQ activ- 
ity score and the majority of pertinent PFT results (e.g., 
FVC or DLco) or arterial blood gas analysis in all stud- 
ies, while correlations between the SGRQ symptoms or 
impact domain scores and these variables were gener- 
ally weak to moderate. Results for FVC, the lung func- 
tion parameter regarded as the most statistically useful 
physiological indicator of IFF severity, and the one 
most frequently used as a primary endpoint in contem- 
porary clinical trials, were weakly to moderately corre- 
lated with SGRQ total and domain scores (r = -0.34 
to -0.45 for the SGRQ total and -0.13 to -0.31 for the 
SGRQ domains). 

HRCT 

In one study of patients with IFF, investigators assessed 
correlations between SGRQ scores and the extent of 



fibrotic abnormalities on HRCT (degree of ground-glass 
opacity [CT-alv], interstitial opacity [CT-fib], and both 
[total score]) (Table 5). Correlations were moderately 
strong between the SGRQ symptoms, impact and total 
scores and CT-alv or total scores (r = 0.34 to 0.42) and 
moderately strong between the SGRQ activity score and 
both the CT-fib and total scores (r = 0.37 to 0.39) [28]. 

Known groups validity 

Although there are no well-established categories of dis- 
ease severity in IFF, it may be hypothesized that patients 
receiving supplemental oxygen represent patients with 
more severe disease. In two studies, investigators found 
that SGRQ total scores were worse in patients using 
supplemental oxygen versus those not using supplemen- 
tal oxygen [15,38]. In one study by Chang and col- 
leagues, the magnitude of difference between patients 
using versus not using oxygen was 4.7 (p < 0.05) [15]. 
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Table 3 Correlation coefficients between SGRQ scores and the 6MWD as a measure of exercise capacity 




Measure 


Correlation with 


Correlation with 


Correlation with 


Correlation 






SGRQ symptoms 
domain score 


SGRQ activity 
domain score 


SGRQ impact 
domain score 


with SGRQ 
total score 


Cross-sectional studies 












Chang ef al., 1999 [15] 


6MWD 








-0.66* 


du Bois eta/., 2011 [16] 


6MWD 








-0.26* 


Peng ef o/., 2008 [28] 


6MWD 


-0.32+ 


-0.43* 


-041* 


-045* 


Yorke eta!., 2010' [10] 


6MWD 


-0.14 


-0.32'' 


-0.24* 


-0.28* 


Yorke ef a!., 201 1 [39] 


6MWD 


-0.32+ 


-0.54+ 


-047* 




Zimmermann ef al., 2007 [40] 


6MWD 


-041 


-0.72* 


-0.63* 


-0.72* 


Longitudinal studies 












du Bois eta!., 2011 [16] 


A 6MWD 








-0.231* 


Nishiyama ef al., 2008 [25] 


A 6MWD 








-043* 


Peng ef al., 2008 [28] 


A 6MWD 


NS 


-0.43 


-046 


-041* 



6MWD = Distance covered in 6-minute waik test; A = change. 

*p < 0.05; *p < 0.01; *p < 0.001; < 0.0001; NS = non-significant. 'Data reported refer to the original version of the SGRQ, not the SGRQ-I. 



Test-retest reliability (reproducibility) 

No studies were found that reported data on the test- 
retest reliabiUty of the SGRQ in patients with stable IPF. 

Minimal important difference 

A triangulation approach has been used to determine an 
MID estimate for SGRQ scores in patients with IPF [34]. 
Using both distribution- and anchor-based approaches 
(using FVC, DLco and the TDI as anchors), the MID for 
the SGRQ symptoms, activity, impact and total scores was 
8, 5, 7 and 7 respectively. 

Responsiveness 

The responsiveness of the SGRQ domain and total scores 
has been assessed in one study [34] . Using data from a ran- 
domized placebo-controlled trial of bosentan, investigators 
assessed the ability of the SGRQ to discriminate among 
IPF patients who had experienced an improvement, de- 
cline, or no change in disease status over 6 months, as de- 
fined by three clinical anchors (change in FVC, DLco> 
transition dyspnea index [TDI]). With the exception of the 
SGRQ symptoms score when DLqq was the anchor, 
changes in SGRQ domain and total scores differed signifi- 
cantly between patients who had declined, remained stable, 
or improved. [34]. Change scores from the SGRQ total and 
its domains were reported for the DLco and TDI response 
categories and ranged from -i-3 to +13, +1 to -5, and 0 
to -12 for patients that declined, remained stable, or im- 
proved, respectively. The impact domain discriminated best 
between all categories of change for all three anchors [34]. 

SGRQ as an endpoint 

In sixteen trials, investigators used the SGRQ domain 
and/or total scores as outcome variables. In four trials. 



investigators evaluated the within-subject change in 
SGRQ total score from baseline to end of treatment 
[22,23,32,37] (Table 6). In all four, improvements were ob- 
served in exercise endurance or FVC; among these, in 
three there was a significant decrease in SGRQ total score 
from baseline to end of treatment (8-24 weeks). 

In the remaining 12 trials, investigators assessed 
whether the SGRQ domain and/or total scores differed 
between active and placebo groups (Table 7). In four of 
these [13,17,18,25], statistically significant between-group 
differences for the primary endpoint coincided with statis- 
tically significant between-group differences in at least one 
SGRQ total or domain score (range of between-groups 
difference in SGRQ total score: -6.1 to -13.4). Six studies 
[17,20,26,29-31] reported a lack of statistically significant 
treatment effect in the primary endpoint or SGRQ scores 
(range of between-groups difference in SGRQ total score 
reported in three studies: -0.5 to -3.0; scores were not re- 
ported in three studies). In three studies [19,33,41], the 
primary endpoint was not met, but the SGRQ total or do- 
main scores were significantly different between treatment 
groups (range of between-groups difference in SGRQ total 
score: -3.3 to -6.1). 

Four studies [20,31,33,41] reported changes from base- 
line in SGRQ total score in the placebo group. Adjusting 
for different trial durations, the SGRQ total score in the 
placebo arms of these trials deteriorated (increased) by a 
median of +4.9 (range: 3.2 to 10.6) per 52 weeks. 

Floor and ceiling effects 

No studies were found in which investigators reported data 
on floor and ceiling effects for the SGRQ in patients with 
IPF. However, in most studies, the minimum and maximum 
achievable SGRQ total scores (0 and 100, respectively) were 
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Table 4 Correlation coefficients between SGRQ scores, pulmonary function tests and arterial blood gas analysis 





Lung function measure 


Correlation 
with SGRQ 
symptoms 
domain score 


Correlation 
with SGRQ 
activity domain 
score 


Correlation 
with SGRQ 
impact domain 
score 


Correlation 
with SGRQ 
total score 


Chang ef al., 1999 [15] 


DLco% predicted 








-0.55+ 




FEVi% predicted 








-046+ 




FVC% predicted 








-045+ 




TLC% predicted 








-0.36+ 


Nishiyama et al., 2005 [24] 


PaOj 


-0.21 


-0.48+ 


-0.29 


-0.37* 




Sp02 


-0.38* 


-0.48+ 


-0.22 


-0.37* 




TLC 


-048+ 


-0.38* 


-0.21 


-0.36* 




TLco 


-0.32* 


-045+ 


-0.27 


-0.39* 




VC 


-0.35* 


-0.36* 


-0.15 


-0.30 


Peng ef al., 2008 [28] 


DLco% predicted 


-045'' 


-046'' 


-0.34+ 


-0.44+ 




FEVi% predicted 


NS 


-0.53'' 


-0.34+ 


-042'' 




Pa02 


NS 


-0.54+ 


NS 


-0.32+ 




TLC% predicted 


-0.50'' 


-0.61'' 


-0.52" 


-0.62'' 




VC% predicted 


NS 


-0.59'' 


-0.35+ 


-047'' 


Tzanakis ef ai, 2005 [36] 


FEV,% predicted 








-0.50+ 




PaOj (at rest) 








-0.51 + 




Pa02 (at exertion) 








-0.60+ 




TLC% predicted 








-0.55+ 


Yorke et al., 2010^ [10] 


FVC% predicted 


-0.27+ 


-0.31^ 


-0.30'' 


-0.34'' 




TLco% predicted 


-0.23+ 


-0.34'' 


-0.38'' 


-0.38" 


Yorke ef al., 201 1 [39] 


DLco% 


-0.16 


-0.37+ 


-0.28* 






FVC% 


-0.13 


-0.16 


-0.24* 




Zimmermann ef al., 2007 [40] 


DLco% predicted 


-041 


-0.32 


-0.39 


-047* 




FE\/,% predicted 


-0.08 


-0.57* 


-0.52* 


-0.57* 




TLC% predicted 


-0.37 


-0.65* 


-0.58* 


-0.66* 




VC% predicted 


-0.14 


-0.54* 


-0.61* 


-0.56* 


DLco = diffusion capacity of the 


lung for carbon monoxide; FEVi = forced expiratory volume in 


1 second; FVC = forced vital capacity; Pa02 = partial pressure of 



oxygen dissolved in arterial blood; TLC = total lung capacity, TLco = transfer factor of the lung for carbon monoxide; VC = vital capacity. 
*p < 0.05; +p < 0.01; *p < 0.001; ^p < 0.0001; NS = non-significant. 'Data reported refer to the original version of the SGRQ, not the SGRQ-I. 



outside an interval spanning twice the standard deviation 
around the reported means (Table 1). For the two studies 
in which investigators reported ranges for baseline SGRQ 
total scores, ranges did not include minimum or max- 
imum possible values [24,38], thus confirming the absence 
of floor or ceiling effects in these studies. 



Conclusions 

IVIeasurement standards and psychometric criteria have 
been proposed to assist with choosing an appropriate in- 
strument to evaluate HRQL in patients with IFF [6,43]. 
As with any patient-reported outcome measure used in 
the study of any condition, an instrument must have face 



Table 5 Correlation coefficients between SGRQ scores and extent of fibrosis on HRCT 



Study 


HRCT measure 


Correlation with 
SGRQ symptoms 
domain score 


Correlation with 
SGRQ activity 
domain score 


Correlation with 
SGRQ impact 
domain score 


Correlation 
with SGRQ 
total score 


Peng ef al., 2008 [28] 


CT-alv 


041 + 


NS 


0.34* 


0.39+ 




CT-fib 


NS 


0.37* 


NS 


NS 




CT-tot 


0.36* 


0.39+ 


0.35* 


042+ 



CT-alv = ground glass opacity; CT-fib = Interstitial opacity; CT-tot = total. 
*p < 0.01 ; V < 0.001 ; NS = non-significant. 



Swigris ef al. Health and Quality of Life Outcomes 2014, 12:124 
http://www.hqlo.eom/content/1 2/1/1 24 



Page 10 of 14 



Table 6 Changes in SGRQ scores in within-subject clinical trials 



Study 


Treatment under 


Treatment 


Sample size 


SGRQ total score^ 


Effect 


p-value^ 




investigation 


duration 


Total 


IPF 


Baseline 


Post-treatment 


size 




Mishra ef al., 201 1 [22] 


Oral doxycycline 


24 weeks 


6 


6 


50.90 (8.38) 


1 840 (6.39) 


3.88 


<0.001 


Naji ef a!., 2006 [23] 


Pulmonary rehabilitation 


8 weeks 


26 


19 


48.3 [21.5, 82] 


39.5 [1 7.4, 69.4] 


0.41 


<O10 


Rammaert ef a!., 2009 [32] 


Pulmonary rehabilitation 


8 weeks 


13 


13 








NS 


Tzouvelekis ef a!., 2013 [37] 


Endobronchial infusion of 


6 months 


14 


14 


35.1 (6.8) 


27.8 (5.6) 


1.07 


<0.05 




adipose-derived stromal cells 

















^Mean (SD) or median [range] are reported based on availability. 

^p-value for test of statistical significance between SGRQ score at baseline and post-treatment. 



validity, internal consistency, test-retest reliability, longi- 
tudinal validity, and minimal floor and ceiling effects in 
the target patient population. 

The constellation of findings from studies identified in 
our search revealed that in patients with IPF, the internal 
consistency of the SGRQ activity and impact domains 
and the SGRQ total score was excellent, and the internal 
consistency of the symptoms domain was moderate, and 
in most studies, fell below the acceptable threshold of 
0.7. The lower internal consistency of the symptoms do- 
main is likely because it asks about a range of respiratory 
symptoms (cough, sputum, shortness of breath, wheez- 
ing and attacks of chest trouble), the majority of which 
apply to few patients with IPF whose major symptoms 
are shortness of breath and cough. In response data, off- 
target items create a weaker level of inter-relatedness 
among items in this domain, and thus lower internal 
consistency. This also contributes to the lower conver- 
gent validity of this domain, as the off-target items 
weaken the associations between its scores and clinical 
measures of IPF severity (e.g., patients may endorse 
wheezing or attacks of chest trouble, but these symp- 
toms are unlikely related to a person's FVC). These off- 
target (for IPF) items in the symptoms domain detract 
from the SGRQ's face validity and would likely have 
been removed or modified in a tool specifically designed 
for use with patients with IPF. Overall, the symptoms 
domain may be well-suited for patients with COPD, but 
is not tailored to precisely assess symptoms in patients 
with IPF. The non-informative noise in the symptoms 
domain might also contribute to a less than optimal per- 
formance of the SGRQ total score. Overall, however, 
despite its weak face vaUdity in IPF, the symptoms do- 
main performs reasonably well in this population, and 
its potential to detract from the performance of the 
SGRQ total score is tempered because it contributes 
least to the SGRQ total score. 

Convergent validity analyses seek to determine whether 
two measures, hypothesized to measure the same con- 
struct, do in fact correlate, and moderate, statistically 
significant correlations in the expected direction sup- 
port convergent validity. Very strong or 'perfect' cor- 
relations, suggest redundancy in measurement, so 



moderate correlations between a patient-reported out- 
come measure and another clinical variable support con- 
vergent validity of the patient-reported outcome measure 
while confirming that it contributes unique information 
not captured by the other clinical variable [5]. The SGRQ 
has been used as a secondary endpoint in several clinical 
trials conducted in patients with IPF. Among the select 
few in which the intervention outperformed placebo, 
SGRQ results were as one would anticipate, i.e., SGRQ 
scores improved in the group that benefited from the 
intervention. Although not a formal assessment of re- 
sponsiveness, consistency between the changes in SGRQ 
scores and the changes in other endpoints supports 
responsiveness. 

In sum, the limitations of the SGRQ in IPF should be 
noted, as it was not originally developed for use in pa- 
tients with IPF. In particular, this applies to possible 
over-interpretation of results of individual domains. 
However, the cross-sectional correlations between SGRQ 
domain and total scores and other measures of patient- 
reported health status, exercise capacity or lung func- 
tion, along with the ability of the SGRQ to distinguish 
patients who experience a change in clinical status or re- 
main stable over time, support the SGRQ as a useful 
patient-reported outcome measure in IPF. 

Limitations to our research include the following: we 
could only identify one study in which MID estimates 
for the SGRQ scores in IPF were determined [44]. This 
study used a triangulation approach and concluded an 
MID that was higher than that reported for COPD [45], 
but more research with additional datasets is needed to 
evaluate these estimates. In the meantime, the use of re- 
sponder rates of patients experiencing a minimum 
change from baseline in SGRQ scores - or perhaps 
more informative, cumulative distribution plots - may 
be a useful assessment, as research suggests that it may 
be less dependent on the exact cutoff i.e. the precise 
value of the MID [46]. 

No articles were identified that evaluated the test- 
retest reliability of the SGRQ in patients with stable IPF. 
Likewise, we could not locate a study in which floor and 
ceiling effects of SGRQ scores were reported, although 
an analysis of the reported baseline mean SGRQ total 
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Table 7 Changes in SGRQ scores in randomized controlled trials 



Study Treatment Sample 

duration size, IPF 



Randomized groups 



Change from Change from Change from Change from 

baseline in baseline in baseline in baseline in 

SGRQ symptoms SGRQ activity SGRQ impact SGRQ total 

domain score domain score domain score score 



Antoniou ef o/., 12 months SO 
2006 [1 3] 



Han er al., 
2013 [17] 



12 weeks 22 



Horton et al., 
2012 [18] 



Noth et al., 
2012 [26] 



Raghu et al., 
2004 [29] 



Raghu et al., 
2008 [30] 



97 



1 2 weeks 23 



King, Jr. et al., 6 months 1 58 
2008 [19] 



King, Jr. et al., 77 weeks 826 
2009 [20] 



Nishiyama et o/., 10 weeks 28 
2008 [25] 



28 weeks 145 



1 weeks 330 



1 weeks 88 



Raghu et al., 48 weeks 492 
2013 [31] 



interferon gamma b 
Colchicine 
p-value^ 

Sildenafil (with RVSD) 
Placebo (with RVSD) 
Difference' 
p-value^ 

Sildenafil (without RVSD) 
Placebo, (without RVSD) 
Difference' 
p-value^ 
Thalidomide 
Placebo 
Difference' 
p-value^ 
Bosentan 
Placebo 
Difference' 
p-value^ 

Interferon gamma b 

Placebo 

p-value^ 

Pulmonary rehabilitation 

No pulmonary rehabilitation - 

Difference' -5.7 [-18.7, 7.2] 



-13.2 [21.4,5.0] 
7.5 [-4.5, 1 9.5] 
0.01 



-28.0 [-41.7, -14.4] 
<0.0001 



-3.8 [-10.7, 3.0] 
NS 



-12.1 [22.2,2.0] 
0.018 



p-value 
Warfarin 
Placebo 
p-value^ 

Interferon gamma lb 

Placebo 

p-value^ 

Etanercept 

Placebo 

Difference' 

p-value^ 

Ambrisentan 

Placebo 

p-value^ 



NS 



NS 



-4.8 [-12.7, 3.0] 
4.7 [-12.1, 22.0] 
NS 



-5.6 [-16.1, 5.0] 
NS 



-4.1 [-9.2, 1.1] 
NS 



-3.3 [-9.8, 3.2] 
NS 



-5.8 [-14.7, 3.1] 
NS 



NS 



-1.9 [-9.2, 5.4] -4.7 [-11.4, 2.0] 
4.1 [-6.4, 14.6] 4.8 [-5.9, 15.5] 
NS NS 



-14.0 [-25.6, -2.4] -13.4 [-22.7, -4.2] 
0.02 0.005 



-1.8 [-7.5, 3.9] -3.0 [-7.6, 1 .7] 
NS NS 



-13.1 [-19.7, -6.6] -11.7 [-18.6, -4.E 
<0.001 0.001 



-3.3 (2.6) 
0.034 
5.7 (13.5) 
6.2 (14.3) 
NS 



-6.2 [-12.8,0.3] -6.1 [-11.7,0.5] 
NS <.05 



NS 



NS 



NS 



NS 
4.7 
3.0 
NS 
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Table 7 Changes in SGRQ scores in randomized controlled trials (Continued) 



Richeldi ef al., 


12 months 431 


Nintedanib 50 mg qd 


3.39 (2.51) 


7.39 (1.96) 


3.71 (2.04) 


4.67 (1.78) 


2011 [33] 




Nintedanib 50 mg bid 


2.11 (2.34) 


3.54 (1.82) 


1 .73 (1 .90) 


2.18 (1.65) 






Nintedanib 100 mg bid 


2.33 (2.35) 


3.00 (1.83) 


0.79 (1.91) 


1.48 (1.66) 






Nintedanib 150 mg bid 


-3.14 (2.40) 


0.32 (1.89) 


-014 (1.97) 


-0.66 (1.71) 






Placebo 


6.45 (2.45) 


7.48 (1.91) 


4.21 (1.99) 


5.46 (1.73) 






p-value^ 


<0.005 


<0.005 




<0.01 


Zisman ef al., 


1 2 weeks 1 80 


Sildenafil 


-3.58 [-7.02, -0.13] 


-1.15 [-3.68, 1.38] 


-0.88 [-3.78, 2.02] 


-1.64 [-3.91, 0.64] 


2010 [41] 




Placebo 


2.15 [-1.30, 5.61] 


2.49 [0.00, 4.99] 


2.82 [-0.03, 5.67] 


2.45 [017, 4.72] 






Difference' 


-5.73 [-10.61,-0.85] 


-3.54 [-7.20, -0.09] 


-3.70 [-7.76, 0.37] 


-4.08 [-7.30, -0.86] 






p-value^ 


0.02 


.04 


NS 


.01 



RVSD = right ventricular systolic dysfunction. 

'Difference in change from baseline between treatment groups, mean [95% CI]. 

^Test of statistical significance for the difference in mean change from baseline between groups. 

^Test of statistical significance for the difference in mean change from baseline between the nintedanib 150 mg bid and placebo groups, 
"treatment continued for >12 months (data not available). 



scores and their standard deviations suggested that there 
was no evidence for either. Furthermore, we did not as- 
sess the content vahdity of the SGRQ in patients with 
IPF, nor did we include analyses of articles published in 
languages other than English. Content validity and cul- 
tural adaption are important factors to consider for any 
patient-reported outcome measure, but these topics 
were beyond the scope of this evaluation of the SGRQ's 
psychometric properties. Therefore, it is evident that 
more research on the SGRQ is needed in this patient 
population. 

The utility of a patient-reported outcome measure 
may be assessed only after a wealth of data becomes 
available. The assessment involves examining how the 
measure performs in the target population under several 
circumstances. The cache of available data has greatly 
advanced our understanding of HRQL in general, and 
the performance of the SGRQ in patients with IPF. For 
example, whilst the mean baseline SGRQ total score re- 
ported in IPF (around 45; interquartile range: 42-50) is 
similar to that reported in COPD trials [47,48], an ana- 
lysis of the reported changes from baseline in the SGRQ 
total score in the placebo arms suggests that untreated 
patients with IPF deteriorate by -1-4.9 points over a 
period of 52 weeks. This contrasts with the experience 
in COPD, where patients on placebo show an improve- 
ment of 2-3 points per year [46], and reflects the progres- 
sive decline in health status seen in patients with IPF. 

Finally, a major factor in this assessment revolves 
around how confidently response data from the measure 
can be used to make inferences about patients in the tar- 
get population. For example, what can be said about a 
patient with IPF whose SGRQ score is 50? How does 
day-to-day functioning, or how a patient feels, change 
for an IPF patient whose SGRQ score increases by 10 
over 6 months? Being able to answer these, and similar. 



questions confidently and accurately will further and 
more strongly support the validity of the SGRQ as an in- 
strument capable of assessing domains of HRQL in this 
population. Until then, the balance of the data suggests 
that the SGRQ may be a suitable secondary endpoint for 
measuring HRQL in therapeutic trials of IPF. 

Additional file 



Additional file 1: PubMed search strategy. 
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